Thompson Sampling for Combinatorial Network Optimization in Unknown Environments

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Thompson Sampling for Combinatorial Semi-Bandits

We study the application of the Thompson Sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distributiondependent regret bound of O(m log T/∆min) for TS under general CMAB, where m is the number of arms, T is the time horizon, and ∆min is the minimum gap between the expect...

متن کامل

Learning Unknown Markov Decision Processes: A Thompson Sampling Approach

We consider the problem of learning an unknown Markov Decision Process (MDP) that is weakly communicating in the infinite horizon setting. We propose a Thompson Sampling-based reinforcement learning algorithm with dynamic episodes (TSDE). At the beginning of each episode, the algorithm generates a sample from the posterior distribution over the unknown model parameters. It then follows the opti...

متن کامل

Thompson Sampling is Asymptotically Optimal in General Environments

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be nonMarkov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability as...

متن کامل

Thompson Sampling for Dynamic Pricing

In this paper we apply active learning algorithms for dynamic pricing in a prominent e-commerce website. Dynamic pricing involves changing the price of items on a regular basis, and uses the feedback from the pricing decisions to update prices of the items. Most popular approaches to dynamic pricing use a passive learning approach, where the algorithm uses historical data to learn various param...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM Transactions on Networking

سال: 2020

ISSN: 1063-6692,1558-2566

DOI: 10.1109/tnet.2020.3025904